首页> 外文OA文献 >A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning
【2h】

A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning

机译:一种用于多目标多主体强化学习的自适应权重选择算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

To solve multi-objective problems, multiple reward signals are often scalarized into a single value and further processed using established single-objective problem solving techniques. While the field of multi-objective optimization has made many advances in applying scalarization techniques to obtain good solution trade-offs, the utility of applying these techniques in the multi-objective multi-agent learning domain has not yet been thoroughly investigated. Agents learn the value of their decisions by linearly scalarizing their reward signals at the local level, while acceptable system wide behaviour results. However, the non-linear relationship between weighting parameters of the scalarization function and the learned policy makes the discovery of system wide trade-offs time consuming. Our first contribution is a thorough analysis of well known scalarization schemes within the multi-objective multi-agent reinforcement learning setup. The analysed approaches intelligently explore the weight-space in order to find a wider range of system trade-offs. In our second contribution, we propose a novel adaptive weight algorithm which interacts with the underlying local multi-objective solvers and allows for a better coverage of the Pareto front. Our third contribution is the experimental validation of our approach by learning bi-objective policies in self-organising smart camera networks. We note that our algorithm (i) explores the objective space faster on many problem instances, (ii) obtained solutions that exhibit a larger hypervolume, while (iii) acquiring a greater spread in the objective space.
机译:为了解决多目标问题,通常将多个奖励信号标量为单个值,并使用已建立的单目标问题解决技术对其进行进一步处理。尽管多目标优化领域在应用标量化技术以获得良好的解决方案折衷方面取得了许多进展,但尚未对在多目标多主体学习领域中应用这些技术的实用性进行深入研究。代理通过在本地级别线性缩放奖励信号来学习其决策的价值,同时获得可接受的全系统行为。然而,标量函数的加权参数与学习的策略之间的非线性关系使得发现系统范围的权衡很费时。我们的第一项贡献是对多目标多主体强化学习设置中众所周知的标量方案进行全面分析。分析的方法可以智能地探索权重空间,以找到更大范围的系统权衡。在我们的第二个贡献中,我们提出了一种新颖的自适应权重算法,该算法与底层的局部多目标求解器进行交互,并可以更好地覆盖Pareto前沿。我们的第三个贡献是通过学习自组织智能相机网络中的双目标策略来对我们的方法进行实验验证。我们注意到,我们的算法(i)在许多问题实例上更快地探索了目标空间,(ii)获得了具有更大超量的解,而(iii)在目标空间中获得了更大的扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号